Bulletin of Electrical Engineering and Informatics 
Vol. 9, No. 2, April 2020, pp. 582~587 
ISSN: 2302-9285, DOI: 10.1159 1/eei.v9i2.2091 0 582 


Mel-log energies analysis of authentic audible intrusion 
activities in a Malaysian forest 


Amirul Sadikin Md Afendi', Marina Yusoff?, Megawati Omar? 

‘Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia 
?Advanced Analytic Engineering Center (AAEC), Faculty of Computer and Mathematical Sciences, 
Universiti Teknologi MARA, Malaysia 
3Academy of Language Studies, Universiti Teknologi MARA, Malaysia 











Article Info ABSTRACT 

Article history: Wildlife has been endangered due to illegal activities. This requires more 
. effective surveillance measures. Felling timber and poaching are regular 

Received Oct 20, 2019 illegal activities but challenging to detect. Hence authorities should resort to 

Revised Dec 28, 2019 modern technologies such as employing autonoumous surveillance to stop 

Accepted Feb 8, 2020 them. The Malaysian forest audio data were recorded to lay a foundation in 


initiating a cheaper and practical approach. Hence this paper reports 





the collection, processing and analysis of audio data in preparation to 
Keywords: develop an autonomous sound event detection system. The recording was an 
emulation of possible illegal activities in a reserved forest. Sounds 
of chainsaw and hand hatchet cutting tree trunks were taken. It was found 
that there was a distinct pattern in the Mel-log energies audio feature 


Acoustic features 
Anti-poaching 


Audio processing ; of the sound, which could be used to identify illegal activities. Thus, it is 
Sound event detection believed that a detection through audio is a possible approach to be employed 
Wildlife protection as one of the methods to stop illegal activities in the tropical reserve forests 


like those in Malaysia. 


This is an open access article under the CC BY-SA license. 





Corresponding Author: 


Marina Yusoff, 

Advanced Analytic Engineering Center (AAEC), 

Faculty of Computer and Mathematical Sciences, 

Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia. 
Email: marinay @tmsk.uitm.edu.my 








1. INTRODUCTION 

Theres been a surge of threats to wildlife globally that requires more sophisticated efforts to counter. 
Malaysia’s wildlife conservation enforcement faces many challenges in effort to protect flora and fauna [1]. 
These intrusion activities include illegal tree felling and poaching. Recently, the perpetrators of illegal 
logging are becoming too comfortable [2]. The poachers are hunting animals to the brink of extinction [3]. 
It is tough to catch the poachers in the act and prevent them, only after wildlife casualties the poachers 
were captured [4]. 

The felling and other sounds in these activities, such as those of chainsaw and gunshots can be heard 
afar but authorities fail to locate the source in time [3]. Hence the perpetrators can escape easily. 
The common gunshot noise could travel over a mile in certain circumstances [5]. Malaysian forest 
environment is unique with its composition of climate, trees and animals. Sound travels much faster in warm 
air near the ground than in colder air higher up [6]. The research collects audible intrusion activities sound 
that could be heard in range of 100 m. Thus, the bigger picture of this research was an attempt to record 
the illegal activity sounds to explore new feasible methods to prevent illegal activities in tropical forest 
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reserves. The approach of audio detection has not been attempted in Malaysia. However, it is believed that by 
employing it the authority can save cost in protecting wildlife. 

Past studies in speech recoginition has successfully been used in our daily lives. Now the research in 
Sound Event Detection (SED) is showing possibilities for application [7]. The detection of rare sounds such 
as baby crying, glass breaking, and gunshot were tested by numerous researchers in the DCASE 2017 [8]. 
The DCASE 2017 challenge results show accuracies upto 95% detection of these rare sounds [9, 10]. 
This shows the applicability of SED in autonomous surveillance is almost ready for the industry. The present 
research aims to apply SED as poacher intrusion in wildlife reserves. In the big picture the study aims 
in setting up the foundation of sound-based wide-area autonomous surveillance in a Malaysian 
forest environment. 

The objective of this research is to record and analyse the audio of the emulated illegal activities. 
It is to discover the audio feature patterns between different audible illegal activities in reserve forest regions. 
The patterns will contribute to the possibility of SED to monitor the said illegal activities. The main concern 
to record the sound of audible illegal activities in a reserved forest environment. To analyse patterns of audio 
features between illegal activities and the ambience. This paper demonstrates the collection, processing 
and analysis of audio data. The data analysis should show the applicability to develop an autonomous sound 
event detection system for surveillance in a Malaysian Forest. 


2. MEL-LOG ENERGIES AUDIO FEATURE 

Mel-log energies MLE also known as Log Mel-filter energy features (LMFE) or Mel-scaled 
spectrogram. Mel filtering shows the characteristics of the human auditory system in frequency, the natural 
logarithm compensates with the non-linear loudness [11]. The feature extraction method employed in this 
research is based on Mel-log energie. Research has been conducted to find better representations that allow 
the extraction of useful information from audio [12]. A useful feature has a distinctive consistent pattern 
and low signal-to-noise ratio. The low signal-to-noise ratio will allow to achieve higher accuracy in 
detection [13]. The features however have many parameters that could be tweaked and tuned for 
performance [14]. The extraction of MLE features are based on the following steps and parameters. 
The following steps are the procedures to extract the said features. 

- Take the transformation of (a windowed extract of) a signal from Fourier. 
- Using triangular overlapping frames to project the power of the spectrum obtained above to the Mel -scale 
- Take the power logs at each frequency of the Mel-scale. 

Cepstral features that are computed by taking the FFT of the warped log spectrum, they contain data 
of the changes in the various spectrum bands [15]. Cepstral features are favourable due to their ability to 
separate the impact of source and filter in a speech signal [16]. The influence of the source and the filter 
sound can be differentiated as there are in various regions in the domain of cepstral features [17]. MLE was 
the chosen feature for this experiment as to poular opinion that it holds more dimensions compared to MFCC. 


3. METHODOLOGY 
The methodology consists of data collection and analyse of audio features. The details of location, 
sound collected, equipment used in the present research are explained below. 


3.1. Location of authentic audio collection 

The location of authentic audio collection was at Endau Rompin National Park, Johor, Malaysia on 
4th August 2019. The Endau Rompin National Park region is a tropical rainforest with equatorial weather, 
hot and humid throughout the year. Its yearly average rainfall is 2000mm to 2500mm [15]. The audio was 
collected in the environment of a thick forest. A thick forest is a place having high density of trees 
and vegetation. This location has various obstruction for sound to travel. The obstruction can be trees trunk, 
thick foliage, and woods growing under the tree canopy. A thick foliage usually dampens the sound waves 
emitted by the source, which can reduce the audible distance of the said sound [18, 19]. 


3.2. Sound collected 

The collected sounds were suggested by the Wildlife Conservation and Science (WCS) Malaysia. 
The WCS is an active non-governmental organisation which has been strong effort in wildlife conservation. 
The sound collected were three types obtained from the following activities: 
- Hatchet hitting a tree trunk 
- Forest ambience 
- Chainsaw revving 
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The machete and axe were employed because they are often used by the poachers to cut trees to 
build their camps. A large axe hitting a tree can be heard a mile away in certain cases [18]. 
Then the chainsaw is a common tool of illegal loggers. The listed sounds are considered rare sounds in forest 
reserves. The next element considered in the data collection is the ambience of the location. This sound data 
consists of the forest natural environment. It is important because the sound is considered as 
the background noise. 


3.3. Recording equipment 

Recording equipments include, a recorder and microphone. The recorder used is a Zoom H6 Handy 
Recorder paired with a LR XYH Zoom H6 Capsule Mic. The settings for recording audio are sampling rate 
of 44.1 kHz with Phantom Power 12V. In digital audio recording it is reccommended that the sampling rate 
must be double the maximum audio frequency generally accepted for humans [19]. Thus, the research 
employes sampling rate of 44.1 kHz. 


3.4. Audio feature extraction 

Audio Feature Extraction is a process done on the collected audio data. The features will allow 
recognition but high dimensional feaetures casues overfitting [20]. The Cepstral features such as MLE 
mimics the human preception of sound [21]. The feature of MLE has produce well in the past on SED. 
The features selections plays a crucial role influencing the results [22]. MLE features are extracted from 
the audio files using python libraies. The research uses python libraries such as matplotlib.pyplot used to plot 
the feature heatmap representations,speechpy is used to extract features with specific parameters as stated in 
Table 1. In specific the functions from speechypy applied in research is speechpy.processing.preemphasis 
and speechpy.feature.lmfe.Feature extraction parameters are the tweakable variables in the extraction 
of features. The features include frame length, frame stride, number of filters, FFT length, 
minimum and maximum frequency of Mel-filter banks. The value of variables included in this experiment is 
stated in Table 1. 


Table 1. MLE feature extraction parameters 


Parameters Value 


100ms 








frame length: 
The length of each frame in seconds. 
frame stride: 


The step between successive frames in seconds. puns 
Number of filters: 40 
The number of mle to extract, 

Fast Fourier Transform length: 512 
Number of FFT points 

Min frequency: 

Minimum band edge of Mel-filters. ey 
Max frequency: 20kHz 


Maximum band edge of Mel-filters. 





The parameters employed in this research were based on common practices of past studies. 
The minimum and max band edge for Mel filters are set based on the human hearing capabilities [22]. 
The frame length of 100ms is believed fine enough for the 10-second audio file analysis. The frame stride 
of 50ms also means that the features are overlapping 50% from the previous frame. This provide more detail 
of feature changes in between windows. The features should gather information in the whole spectrum to 
allow maximum performance [23] 


4. RESULTS 

A significant pattern was found on the Mel-log energy audio features between audible illegal 
activities. The patterns are believed to be good fingerprints in training artificial intelligence for intrusion 
detection. Figure 1 shows the feature extracted as a heat map of forest ambience. The heat map is 10-second 
cut of forest ambience with 40 MLE a frame. A frame is 40 MLE obtained from 0.1 seconds of audio. 
Each frame consists of overlapping audio features of 50% using a hop(stride) size of 0.05 seconds on each 
frame. Whereas each frame is fitted with a hamming window to reduce spectral leakage. The overlapping 
of 50% was important to detect changes in between frames more effectively compensating for the hamming 
window [24]. This was to avoid losing meaningful features in between frames as hamming windows are 
focused on the centre of the audio frame. 
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Ten seconds of audio produced 198 frames and each frame contained 40 MLE. The total features 
extracted were 7920 floating numbers. These features are visualized in the heat map representation using 
python plot library. As a heatmap visualizations is an effective method in pattern analysis [25]. The following 
Figures 1, 2 and 3 shows ths heatmap of extacted features of their respective aubible intrusion activity. 
The brighter or hot colours, which is yellow, shows the presence of a certain audio event. The darker or cold 
colours, which is blue, shows the less intense banks. Figure 1 show the MLE heatmaps of the forest ambience 
the natural sound of the forest. 

Figure 2 shows the extracted features of chainsaw activity. It can be seen that the banks of 0-6 
and 20-40 are more intense from the rest. Its pattern too is consistent throughout the 10-second audio. 
As such this can be used as the audio event fingerprint for this particular activity, which is the sound 
of a chainsaw being used to cut logs. Figure 3 shows the heat map of hatchet activity, a higher intensity 
(shown as yellow) on the Mel-log filter banks in the range of 29-40. It is consistent throughout the 10-second 
audio. The pattern consistency of features can be employed on the classification of the audio event. 
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Figure 2. Extracted features heat map of 10-second chainsaw activity recorded 
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Figure 3. Extracted features heat map of 10-second hatchet activity recorded 


The representations in heatmaps eases the pattern analysis task. A collection of heatmaps can be 
compared side by side to be observed. In search of patterns in the features, it is found that some significant 
patterns can be easily be found. Table 2 shows an overview of the analysis: First, the intense regions 
compared to the less intense areas show distinctive pattern that can be observed in specific ranges 
between MLE. Next, the distribution range of MLE values lower and upper limit. The values vary in 
different activities. 


Table 2. Patterns of mel-log feature heatmaps 








Activity Intense areas of MLE Di inition anes 
(lower-upper) 
Chainsaw Revving 0-6 and 20-36 5-15 
Forest Ambience 33-36 6-21 
Hatchet Cutting 0-10 and 33-36 2-12 





5. CONCLUSION 

This paper presents Mel-log energies analysis of authentic audible intrusion activities in a Malaysian 
forest. The objective of this research is to obtain authentic Malaysian forest environment audio and analyse 
audible intrusion activities MLE audio features. The study has achieved to obtain and analyse MLE features 
of the audio. Heatmap representations were used to find patterns of illegal intrusion activities in MLE 
features extracted. It is found that the MLE audio features extracted have distinctive patterns to their 
respective audio sources. Hence autonomous surveillance is believed to be possible by using MLE audio 
features and machine learning in a Malaysian forest environment. It is important to mention here 
the obstacles faced during the data collection. The first obstacle was the equatorial weather, being hot 
and humid throughout the year with an average rainfall is 2500mm a year. Recording equipment should be 
waterproof or rainproof to avoid damage. Next is the tropical bugs might be infesting in the equipment in 
long term recording sessions. An issue that affected the data collection was that the present researcher was 
not able to employ gunshots as another audible intrusion activity due to Malaysian national park policies and 
limited provisions. The patterns discovered are deemed to allow artificial intelligence in monitoring forest 
intrusion activities. It also maybe a feasible approach for wildlife protection in forests environment like 
Malaysian rainforest. This may improve the security of wildlife in the world. These could be the initial step 
in employing intelligent security systems in wildlife protection. As has been discussed, this attempt may have 
practical importance in initiating a new and cheaper solution. 
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